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DYNAMICALLY EVALUATING AN ELECTRONIC COMMERCE 
BUSINESS MODEL THROUGH CLICK STREAM ANALYSIS 



TECHNICAL FIELD 

The present invention relates in general to dynamic click stream 
analysis and, in particular, to a system and method for dynamically evaluating 
an electronic commerce business model through click stream analysis. 
BACKGROUND OF THE INVENTION 

The methods and means for transacting commerce continue to evolve 
in close step with technological advances. For instance, the traditional ways 
of selling goods and services, the so-called "brick and mortar" approach, have 
expanded into remote sales through mail order and telephonic catalog sales 
and television-based shopping "networks." Electronic commerce ("e- 
commerce") presents the latest approach to transacting remote sales and 
related commerce. 

E-commerce is primarily computer network-based and requires a three- 
part support infrastructure. First, individual consumers must have some form 
of client computer system, such as a personal computer typically executing a 
browser application. Second, businesses must field a host computer system 
executing a server application and an associated database. The database 
ordinarily stores information on the goods and services offered. Finally, the 
host computer system must be interconnected to each client computer system 
via a data network or similar form of interconnectivity. The data network can 
include intranetworks, also known as local area networks, and wide area 
networks, including public information internetworks, such as the Internet, and 
any combination thereof. 

Most e-commerce systems are Web-based. Typically, the host 
computer system executes a server application for presenting a Web site that 
creates a virtual, user-readable "storefront." The storefront is actually a series 
of downloadable Web pages structured in a hierarchical manner with 
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embedded hyperlinks connecting to other related Web pages and content. The 
Web site is organized as a catalog of goods and services and includes means 
for secure purchasing. During operation, consumers transact commerce in a 
purchasing session consisting of requests for Web pages sent to and replies 
5 received from the host computer system. 

E-commerce differs from traditional commerce means in several 
respects. Unlike traditional methods, the bulk of interaction between the 
consumer and vendor is through an impassive computer system and there is 
generally little to no opportunity to offer person-to-person, individualized 

10 sales and service. As well, the immediacy of purchasing and ease of 

comparison shopping results in low customer loyalty. Moreover, competitive 
drivers short-circuit the selling process by pro-actively soliciting sales with 
targeted specials sold at low margins. These competitive drivers work to 
entice a consumer to visit a competing vendor's Web site, potentially resulting 

15 in lost sales. E-commerce vendors attempt to address these differences by 
incorporating presentation and demographic business models into their Web 
sites. 

Presentation models describe the physical layout and functionality of a 
virtual storefront. Presentation models are the Web-based equivalent of 

20 conventional consumer marketing. However, the effectiveness of a 

presentation model is difficult to judge due to the lack of subjective customer 
feedback. Conventional measurement methodologies for brick-and-mortar 
storefronts fail to provide sufficient an adequate solution. For instance, sales 
volumes and repeat Web site visits only partially reflect a Web site's 

25 effectiveness. Incomplete transactions and failed product searches are 

typically not measured nor analyzed yet could provide valuable insight into a 
Web site's effectiveness. 

The demographic model implements the actual sales model based on 
statistical and behavioral models of measured and predicted consumer buying 

30 habits. Conventionally, demographic data is fairly static and is generally 
collected and processed periodically to determine consumer behavioral and 
purchasing trends. Demographic analysis is performed generally through 
applied artificial intelligence and statistical modeling. Persuasive factors and 
dependent variables are identified and weighed and, if necessary, new 
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demographic models are built. However, e-commerce-based demographics 
tend to fluctuate much more rapidly than conventional demographics and 
periodic processing can result in lost sales volume. Depending upon the e- 
commerce Web site, both presentation and demographic models can age at an 
5 unknown rate. 

In the prior art, click stream analysis has been used to evaluate their 
business models. Theoretically, every consumer's visit can be tracked, step- 
by-step, by collecting and storing the "click stream" of Web pages and content 
selections made during a given visit to the vendor's Web site. These click 

10 streams can be analyzed to determine purchasing trends and consumer 

behaviors. However, click stream analysis has historically not been performed 
due to the extremely high volume of traffic. Moreover, the off-line processing 
techniques used to evaluate demographic models are based on relatively static 
data sets. Such processing techniques are slow and ill-suited for dynamic e- 

1 5 commerce applications . 

Therefore, there is a need for an approach to dynamically analyzing 
and evaluating business models incorporated into e-commerce Web sites. 
Preferably, such an approach would utilize click stream data representing a 
path through a Web site. Such an approach could be used to validate 

20 presentation and demographic models in a responsive, potentially near real- 
time manner. 

There is a further need for an approach to collecting and analyzing 
large data sets of on-line streams of Web page and content selections. Such an 
approach could be used to form structured data sets amenable to conventional 
2 5 data mining techniques . 

DISCLOSURE OF INVENTION 
The present invention provides a system and method for evaluating an 
e-commerce business model through on-the-fly click stream analysis. Click 
streams are collected and analyzed concurrent to on-going Web server 
30 operations. Each click stream records a path through the e-commerce Web 
site representing the selections of Web pages and content made by visiting 
consumers. The click stream paths are stored as data vectors, preferably 
reduced in size, and classified. The click stream paths are then analyzed using 
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analytical processing and data mining techniques and the e-commerce 
business models are validated. If necessary, new models are generated. 

An embodiment of the present invention is a system and method for 
transacting electronic commerce via a hierarchically structured Web site with 
click stream feedback. A plurality of individual Web pages structured in a 
hierarchical manner as a Web site are served. Each Web page includes one or 
more hyperlinks selectable by a user to provide at least one of another Web 
page and content. A click stream for each user session on the Web site is 
collected. The click stream includes data entries recording one or more of the 
hyperlinks selected during the user session. Each collected click stream is 
processed. The collected click stream are classified into one or more pre- 
defined categories describing characteristics shared by a plurality of the click 
streams. The collected click stream are analyzed. One or more structured 
queries are executed on the collected click streams based on at least one of the 
data entries and the shared characteristics. Alternatively or in addition thereto, 
data mining is performed on the collected click streams by assigning an 
independent variable and at least one dependent variable and determining the 
relative weightings thereon. 

A further embodiment of the present invention is a system and method 
for dynamically evaluating an e-commerce business model through click 
stream analysis. An e-commerce business model is incorporated into a Web 
site. The Web site includes a plurality of related Web pages structured in a 
hierarchical manner. Each Web page includes one or more hyperlinks 
selectable by a user. A plurality of data vectors is stored. Each data vector 
represents a click stream path through the Web site. Each data vector includes 
a set of data entries that each corresponds to Web content selected via the 
hyperlinks in the related Web pages. Each click stream path is classified 
based on at least one such data entry in the data vector. The classified click 
stream path shares at least one common characteristic with one or more other 
click stream paths. The classified click stream paths are analyzed according to 
a pre-defined evaluation procedure directed at determining at least one of 
efficacy of presentation and shifts in demography. The e-commerce business 
model is compared to the classified click stream paths analysis. 



Still other embodiments of the present invention will become readily 
apparent to those skilled in the art from the following detailed description, 
wherein is described embodiments of the invention by way of illustrating the 
best mode contemplated for carrying out the invention. As will be realized, 
the invention is capable of other and different embodiments and its several 
details are capable of modifications in various obvious respects, all without 
departing from the spirit and the scope of the present invention. Accordingly, 
the drawings and detailed description are to be regarded as illustrative in 
nature and not as restrictive. 

DESCRIPTION OF THE DRAWINGS 

FIGURE 1 is a block diagram showing a distributed computing 
environment, including a system for dynamically evaluating an e-commerce 
business model through click stream analysis, in accordance with the present 
invention. 

FIGURE 2 is a detail block diagram showing the system for 
dynamically evaluating an e-commerce business model through click stream 
analysis of FIGURE 1. 

FIGURE 3 is a flow diagram showing a sample e-commerce order 
processing sequence. 

FIGURE 4 is a tree diagram showing, by way of example, a set of 
potential click stream paths through a hierarchically structured Web site. 

FIGURE 5 is a block diagram showing, by way of example, a data path 

vector. 

FIGURE 6 is a table diagram showing, by way of example, a matrix of 
demographic data values. 

FIGURE 7 is a block diagram showing the functional software 
modules of the server of FIGURE 2. 

FIGURE 8 is a flow diagram showing a method for dynamically 
evaluating an e-commerce business model through click stream analysis in 
accordance with the present invention. 

FIGURE 9 is a flow diagram showing a routine for collecting and 
analyzing click streams for use in the method of FIGURE 8. 

FIGURE 10 is a flow diagram showing a routine for reducing data 
vector sizes for use in the method of FIGURE 9. 



FIGURE 1 1 is a flow diagram showing a routine for classifying a click 
stream path for use in the method of FIGURE 9. 

FIGURE 12 is a flow diagram showing a routine for analyzing data 
stream paths for use in the method of FIGURE 9. 

FIGURE 13 is a flow diagram showing a routine for validating e- 
commerce business models for use in the method of FIGURE 9. 

BEST MODE FOR CARRYING OUT THE INVENTION 

FIGURE 1 is a block diagram showing a distributed computing 
environment 9, including a system 10 for dynamically evaluating an e- 
commerce business model through click stream analysis, in accordance with 
the present invention. The system 10 consists of a server 1 1 operating on a 
host computer system that serves Web pages and content to a plurality of 
clients. 

Various types of clients can be interconnected to the server 1 1 . These 
clients include a local client 12 interconnected directly to the server 1 1 and a 
dial-in client 13 interconnected via a set of modems 14. In addition, a network 
client 15 can be interconnected through an Internet service provider (ISP) 16 
that is interconnected to the server 1 1 via an internetwork 17, including the 
Internet. Similarly, one or more local area network (LAN) clients 1 8 can be 
interconnected to the server 1 1 via an intranetwork 19 that is itself 
interconnected to the internetwork 17 via a router 20 or similar device. Other 
types of clients, network topologies and configurations, and forms of 
interconnection are feasible. 

In addition to performing those tasks ordinarily associated with hosting 
network services, the server 1 1 executes two principal applications: an active 
server presentation (ASP) server 21 and a click stream analyzer 22. The ASP 
server 21 functions as the primary interface to the individual clients through a 
dynamically generated virtual "storefront" for transacting e-commerce. The 
server 1 1 includes a secondary storage device 23 in which databases 24 and 
ancillary files 25 are maintained. The databases 24 and ancillary files 25 are 
further described below with reference to FIGURE 2. 

The virtual storefront is implemented as a Web site that is accessible to 
the clients over the "Web." The Web, shorthand for "Worldwide Web," 
loosely refers to session-oriented data communications occurring in a 



networked computing environment and conforming to the Hypertext Transport 
Protocol (HTTP). HTTP communications usually occur over Transmission 
Control Protocol/Internet Protocol-based (TCP/IP) data networks, although 
other types of packet switched data networks also support HTTP. The HTTP 
suite is described in W.R. Stevens, "TCP/IP Illustrated/' Vol. 3, Chs. 13-14, 
Addison-Wesley (1996), and the TCP/IP suite is described in W.R. Stevens, 
"TCP/IP Illustrated," Vol. 1, Ch. 1 et seq., Addison-Wesley (1994), the 
disclosures of which are incorporated herein by reference. 

The virtual storefront Web site is organized as a catalog of goods and 
services. Preferably, the Web site includes means for making secure 
purchases. The Web site consists of a series of dynamically generated, 
individually downloadable Web pages. The Web pages are structured in a 
hierarchical manner. Consumers navigate through the Web site by selecting 
linked Web pages and content using the embedded hyperlinks via a browser 
application 26. Browser applications 26 suitable for use in the present 
invention include the Internet Explorer, licensed by Microsoft Corporation, 
Redmond, Washington, and the Navigator, licensed by Netscape Corporation, 
Mountain View, California. 

Each Web page or content selection by a consumer constitutes a 
"Click," that is, an affirmative selection of a linked item or action specified via 
a client input means, such as a keyboard, mouse or similar input device. The 
series of selections made by a consumer during any given e-commerce session 
constitutes a click stream. The click streams for the population of consumers 
visiting a vendor Web site are analyzed by a click stream analyzer 22 for use 
in dynamically evaluating e-commerce business models, as further described 
below with reference to FIGURE 2. 

The individual computer systems, including the server 1 1 and clients 
12, 13, 15, 18, are general purpose, programmed digital computing devices 
consisting of a central processing unit (CPU), random access memory (RAM), 
non-volatile secondary storage, such as a hard drive or CD ROM drive, 
network interfaces, and peripheral devices, including user interfacing means, 
such as a keyboard and display. Program code, including software programs, 
and data are loaded into the RAM for execution and processing by the CPU 
and results are generated for display, output, transmittal, or storage. 



FIGURE 2 is a detail block diagram showing the system 10 for 
dynamically evaluating an e-conunerce business model through click stream 
analysis of FIGURE 1 . There are seven main sets of databases 24: cookies 31, 
active server page (ASP) scripts and Web pages 32, catalog 33, order 
fulfillment 34, demography 35, click streams 36, and models 37. The ASP 
server 21 uses the cookies 31, active server page (ASP) scripts and Web pages 
32, catalog 33, order fulfillment 34 databases. The click stream analyzer 22 
uses the demography 35, click streams 36, and models 37 databases. In the 
described embodiment, the databases operate under a relational database 
management system, such as Oracle 7, licensed by Oracle Corporation, 
Redwood Shores, California. 

The cookies database 3 1 stores user profile information on individual 
clients indexed by a unique set of "cookies." Each cookie is a unique, 256- 
byte data value assigned to registered consumers. Individual cookies are 
stored by the browser applications 26 (shown in FIGURE 1) and sent to the 
server 1 1 at the start of a transaction session. On most browser applications 
26, cookies are optional. However, when enabled, cookies can allow personal 
and demographic information to be linked dynamically to a given consumer 
instead of generic consumer information. 

The ASP scripts and Web pages ("scripts") database 32 stores the 
virtual storefront. The ASP server 21 (shown in FIGURE 1) generates the 
individual Web pages and content for the Web site. The ASP server 21 
executes the ASP scripts and Web page code stored in the scripts database 32. 
The ASP server 21 interprets server-executable ASP scripts embedded within 
default Web pages to customize the Web page to each consumer based on the 
demographic model 38 and presentation model 39. Each Web page is written 
as a script in a tag-delimited, page description programming language, such as 
the Hypertext Markup Language (HTML) or the Extensible Markup Language 
(XML). Each Web page preferably includes embedded hyperlinks connecting 
that Web page to other related Web pages and content, such as files, images, 
dialogue boxes, and the like. 

In the described embodiment, the Active Server Page technology, 
licensed by Microsoft Corporation, Redmond, Washington, is used. Upon 
execution by the ASP server 21, the ASP scripts are converted into pure W^eb 



content, typically written in HTML or XML. In the described embodiment, 
the Active Server Pages are written as either JavaScripts or VBScripts, both of 
which are described in A.K. Weissinger, "ASP in a Nutshell, A Desktop Quick 
Reference," Chs. 1-3, O'Reilly & Assocs. (1999), the disclosure of which is 
incorporated herein by reference. 

The catalog database 33 stores information about the goods and 
services offered by the vendor. The ASP server 21 incorporates this 
information into Web pages. Similarly, the order fulfillment database 34 
stores information necessary to complete an order placed by a consumer. This 
information can include individual consumer data, shipping options and rates, 
tax rates, and related data necessary to completing a transaction. 

The demography database 35 stores consumer demographic 
information, both on an individualized and categorical basis. Demographic 
information is relatively stable information with changes at a slow rate. Such 
information can include the age, income, personal traits, and geographic 
location for a consumer, as well as statistically derived information about 
other consumers sharing the same relative characteristics. Demographic 
information can be purchased from data research companies as well as derived 
by the vendor based on an analysis of buying trends on their Web site. 

The click streams database 36 stores data vectors representing click 
streams. One click stream is created per consumer session. Each selection 
made by a consumer, that is, each "click," is recorded as a data entry in a data 
vector, as further described below with reference to FIGURE 5. A selection 
can correspond to a Web page or content, as identified by a Uniform Resource 
Locator (URL). 

The models database 37 includes two ancillary files 25, demographic 
model 38 and presentation model 39. These models are the e-commerce 
business models used by a vendor and are incorporated into a Web site 
through execution of the ASP scripts and Web page code. The demographic 
model 38 implements assumptions on consumer behavior derived through 
artificial intelligence and statistical modeling techniques, as are known in the 
art. The demographic model 38 is structured into categories that are tied to 
individual consumers through the cookies database 31. 



The presentation model 39 implements the actual appearance and 
functionality of a Web site. The presentation model 39 defines Web page 
organization and format, as well as implements searching strategies and ease- 
of-use issues. 

FIGURE 3 is a flow diagram showing a sample e-commerce order 
processing sequence 50. In this sample sequence, a client 51 transacts an 
order with a vendor executing an e-commerce Web site on the server 11. At 
the outset of a session, a browser application 26 (shown in FIGURE 1) 
executing on the client 5 1 sends a cookie to the server 1 1 (operation ©) that 
retrieves a user profile from the cookies database 31 (not shown). Based on 
retrieved user profile (operation @), the demographic information for that 
consumer and the demographic and presentation models are retrieved from the 
demography database 35 and models database 37 (operation (D). The server 
generates Web pages that are served to the consumer (operation (D). 

The remainder of the order processing sequence is determined by user 
selections ("clicks"). Each user selection (operation ©) is stored in a vector 
created for the session in the click streams database 36 and, in response, 
appropriate Web pages and content are returned (operation (D). For instance, 
if the consumer sends a request for product information (operation ®) or 
executes a search, a list of consumer information is retrieved from the catalog 
database 3 3 (operation ®), or similar content is returned to the consumer. 
Analogously, if the consumer places an order (operation ®), individualized 
user data and business transaction information is retrieved from the order 
fulfillment database 34 (operation ®). This sequence of operations is 
repeated, in full or in part, for each consumer ordering session and a click 
stream vector is stored for dynamically analyzing the underlying e-commerce 
business models, as further described below with reference to FIGURE 8 et 
seq. 

FIGURE 4 is a tree diagram showing, by way of example, a set of 
potential click stream paths 60 through a hierarchically structured Web site. 
Each node in the tree represents a Web page identified by a unique URL and 
the lines connecting the nodes represent selections ("clicks"). For instance, a 
company named "Acme Corporation" might have a Web site with a Welcome 
page having the URL of 6 littp://www.acme.com/home.htm." A consumer 
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would enter this URL into the browser application 26 (shown in FIGURE 1) 
and a request for that page, including a cookie, would be sent to the server 1 1 . 
In reply, the server would generate a customized Welcome page and serve that 
Web page to the browser application 26. 
5 The first sample click stream path (nodes 61-65) illustrates a product 

purchase. Briefly, a product purchase ordinarily requires placing a product (or 
service) into a temporary holding bin, sometimes called a "shopping basket," 
for the session, editing the order, verifying consumer information, and 
transacting the purchase. Thus, upon receiving the customized Welcome page 

10 (node 61), a consumer would place an item for purchase into the shopping 
basket by selecting a Basket page (node 62). Upon completion of shopping, 
the consumer would review all of the items selected for purchase by selecting 
an Order page (node 63). The consumer would verify the selected items by 
selecting a Verify page (node 64). Finally, the consumer would transact the 

15 purchase by selecting a secure Buy page (node 65). 

The second sample click stream path (nodes 61, 66-68) illustrates an 
incomplete product purchase. Briefly, an incomplete product purchase occurs 
whenever a session ends without the consumer transacting a purchase. This 
scenario covers an almost unlimited set of potential click stream paths, but the 

20 most common path generally involves a failed product search. Thus, upon 
receiving the customized Welcome page (node 61), a consumer might begin 
by searching for a particular item using a Search page (node 66). After several 
rounds of unsuccessful searching, the consumer would decline purchasing 
products by selecting a No Order page (node 67) and would then resume 

25 "browsing" of the Web site by selecting a general purpose Shopping page 
(block 68) offered via a hyperlink to the Welcome page. 

The third sample click stream path (nodes 61, 69) illustrates cross 
sales. Cross sales reflects business affiliations between separate business 
entities and are affected by embedding hyperlinks to an affiliated Web site into 

30 their respective Web sites. Thus, as before, upon receiving the customized 

Welcome page (node 61), a consumer might immediately decide to jump to an 
affiliated Web site by selecting a Link page (node 69). 



- 11 - 



The foregoing click stream paths are merely illustrative and countless 
variations of click stream paths, including paths that include pushed Web 
content originating from the server 1 1 without user selection, are feasible. 

FIGURE 5 is a block diagram showing, by way of example, a data path 
vector 80. This data path vector 80 stores the second sample click stream path 
(nodes 61, 66-68) of FIGURE 4 as a set of data entries 81-86. Each data entry 
corresponds to the URL of the Web content upon each successive "click." 
Thus, the Welcome page (node 61) is stored as data entry 81 . Assuming the 
consumer selected the Search page three times (node 66) before giving up, 
each search attempt would be stored as in successive data entries 82-84. 
Similarly, the No Order page (node 67) and Shopping page (node 68) would 
be stored as data entries 85 and 86, respectively. 

Each data vector 80 is stored in the click streams database 36 (shown 
in FIGURE 2). Note the full URL need not be stored. The volume of data 
generated by storing every click stream path for every consumer can quickly 
add up. Consequently, only that descriptive information minimally required to 
uniquely identify each Web page need be stored in the data path vector 80. 

FIGURE 6 is a table diagram showing, by way of example, a matrix of 
demographic data values 90. These values quantify demographic information 
and consist of an independent variable 97 and one or more dependent variables 
91-96. A plurality of records 98 are generated over time for detecting shifts in 
the relative weightings of these variables using data mining techniques, as 
further described below with reference to FIGURE 13. 

By way of example, two data records are shown. The independent 
variable 97 reflects the total amount of money spent (Total $) and the 
dependent variables reflect the total amount of money spent on books {Book 
#), videos {Video $), music {Music $), and affiliates {Partner $). In addition, 
the entry point (EP) to the Web site is included as a dependent variable. 
Based on the weightings generated for June 1, 2000 (record 99), book revenue 
has the strongest influence on total sales while affiliate revenue has the least. 
However, the weightings generated for July 1, 2000 (record 100) reflect a shift 
in the relative effect of book revenue to a shared influence between book and 
video revenue. Other matrices and statistical weighting models, including 
tree, multidimensional and neural network models, are feasible. 
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FIGURE 7 is a block diagram showing the functional software 
modules of the server 1 1 of FIGURE 2. Each module is a computer procedure 
or program written as source code in a conventional programming language, 
such as the C++ programming languages, and is presented for execution by the 
5 CPU as object or byte code, as is known in the art. The various 

implementations of the source code and object and byte codes can be held on a 
computer-readable storage medium or embodied on a transmission medium in 
a carrier wave. The server 1 1 operates in accordance with a sequence of 
process steps, as further described below beginning with reference to FIGURE 
10 8. 

The click stream analyzer 22 consists of five main modules: data 
reduction 111, classification 112, analyzer 113, comparison and validation 
1 14, and modeler 115. The data reduction module 111 helps to minimize the 
size of click stream paths stored in the click streams database 35 (shown in 

1 5 FIGURE 2) by compressing and generating inferences. The classification 
module 112 categorizes the individual click stream paths into pre-defined 
classes. The analyzer module 113 consists of an analytical processing 
submodule 116 and data mining submodule 117. These modules respectively 
execute pre-defined queries and statistical analyses on the data maintained in 

20 the click streams database 35. The comparison and validation module 1 14 
compares the results of the analyzer module 1 13 to the demographic models 
38 and presentation models 39 and the validation module 114 Aetermines 
whether these models are still valid or require further evaluation. Finally, the 
modeler module 115 generates new demographic and presentation models. 

25 The ASP server 21 consists primarily of an interpreter 118 for 

executing the ASP scripts embedded within Web page code. The interpreter 
118 executes any ASP scripts encountered while requested Web pages are 
being served. The results of the ASP script executions are forwarded to the 
requesting browser applications 26 (shown in FIGURE 1) as ordinarily Web 

30 page code, generally consisting of plain HTML or XML. 

FIGURE 8 is a flow diagram showing a method for dynamically 
evaluating an e-commerce business model through click stream analysis 130 
in accordance with the present invention. The method 130 operates in two 
phases. During the first phase, initialization (block 131), the ASP server 21 
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and click stream analyzer 22 (both shown in FIGURE 1) are booted and 
initialized. 

During the second phase, operation, e-commerce transactions are 
processed and tracked in two iterative threads of execution (blocks 132-135). 
5 In a first thread, the server 1 1 executes ASP scripts and serves Web pages 
(block 133) in response to consumer requests. In a second thread, click 
streams are collected and analyzed (block 134), as further described below 
with respect to FIGURE 9. The threads (blocks 132-135) execute 
continuously until the method 130 terminates, either upon the processing of a 

1 0 last incoming request or upon the receipt of a terminate signal. 

FIGURE 9 is a flow diagram showing a routine for collecting and 
analyzing click streams 140 for use in the method of FIGURE 8. The purpose 
of this routine is to store data streams paths as data vectors 80 (shown in 
FIGURE 5) and process those stored paths. Each click steam through the Web 

1 5 site for each session is stored as a data vector 80 (block 141). In turn, the data 
vectors 80 are stored in the click streams database 36 (shown in FIGURE 2). 
The size of each data vector 80 is reduced to (block 142), as further described 
below with reference to FIGURE 10. Each click stream path is then classified 
according to pre-defined user categories (block 143) and analyzed (block 144), 

20 as further described below with reference to FIGURES 1 1 and 12, 

respectively. Finally, the e-commerce business models incorporated into the 
Web site are validated (block 145), as further described below with reference 
to FIGURE 13. If the e-commerce business models need to be rebuilt, as 
apparent from behavioral shifts or changes in weightings of dependent 

25 variables (block 146), new models are generated (block 147). Preferably, 

quantitative thresholds indicate when the models need to be rebuilt, but other 
measures are feasible. The new models are created using the same 
methodologies with which they were originally built, but using the updated 
demographic information stored in the demography database 35 (shown in 

30 FIGURE 2). The methodologies for building demographic models 38 and 

presentation models 39 are known to those skilled in the art. If there are more 
consumer sessions (block 148), processing continues as before (blocks 141- 
147). Otherwise, the routine returns. 
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FIGURE 10 is a flow diagram showing a routine for reducing data 
vector sizes 160 for use in the method of FIGURE 9. The purpose of this 
routine is to help minimize the storage requirements of the data vectors 80. 
Reducing the data vector sizes is an optional step, but helps to minimize the 
storage requirements needed for storing the data vectors generated by every 
consumer transaction session. Each data path vector 80 is retrieved from the 
click steams database 36 (block 161) and reduced in size by compression 
(blocks 162-166) or inference (blocks 167-169). 

Compression removes repeated data entries from each data vector 80. 
For example, the data entries for the Search page 82-84 in the data path vector 
80 of FIGURE 5 can be compressed into a single data entry for the Search 
page. Thus, if the retrieved data vector 80 is to be compressed (block 162), 
the next data entry in the data vector 80 is obtained (block 163). If the data 
entry is repeated (and is not the first occurrence of that particular URL) (block 
164), the data entry is removed from the data vector 80 (block 165). 
Compression is repeatedly performed on each remaining data entry in the data 
vector 80 (block 166). 

Inference summarizes a data vector 80 into a succinct set of 
descriptors, such as, "cause and effect," "offer and acceptance," "completed 
order versus entry point," and the like. More precisely, an example of a 
"cause and effect" inference would be maintaining a tally of the number of 
searches performed before a purchase is made. An example of an "offer and 
acceptance" inference would be grouping data vectors 80 based on whether a 
consumer made a purchased based on a solicited sale. Finally, an example of 
a "completed order versus entry point" inference would be tracking those 
entry points into a Web site from which a purchase is ultimately made. Thus, 
if the retrieved data vector 80 is be inferred (block 167), a list of inferences, 
preferably maintained as an ancillary file 25 (shown in FIGURE 1), is 
retrieved (block 168) and all appropriate inferences are generated (block 169). 

If there are more data path vectors (block 170), processing continues as 
before (blocks 161-169). Otherwise, the routine returns. 

FIGURE 1 1 is a flow diagram showing a routine for classifying a click 
stream path 1 80 for use in the method of FIGURE 9. The purpose of this 
routine is to categorize each data stream path according to a user-defined set 
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of categories and makes the data stream paths more amenable to data mining, 
as further described below with reference to FIGURE 12. Unlike generating 
inferences, this routine does not attempt to reduce the data vector sizes. 
Rather, each data path vector 80 is retrieved from the click steams database 36 
(block 181) and classified by category (block 182). In the described 
embodiment, the click stream paths are classified into ten main categories, as 
follows: 

(1) Completed order (block 183): click stream paths ending with a 
product or service purchase. 

(2) Length (block 184): number of "clicks" in the click stream 
path. 

(3) Booleans (block 185): based on whether a given condition 
exists within the click stream path, such as whether a particular 
search engine or tool was used. 

(4) Cross sales (block 186): consumer entered the Web site from an 
affiliate Web site. 

(5) Aborted order (block 187): click stream paths ending without a 
product or service purchase. 

(6) Entry point (block 188): initial Web site in data stream path. 

(7) Exit point (block 1 89): last Web site in data stream path. 

(8) Product (block 190): type of product (or service) purchased, if 
any. 

(9) Styles (block 191): style of product (or service) purchased, if 
any. 

(10) Searched by (blo9k 192): search categories used. 

Other classification categories are feasible. If there are more data path vectors 
(block 193), processing continues as before (blocks 181-192). Otherwise, the 
routine returns. 

FIGURE 12 is a flow diagram showing a routine for analyzing click 
stream paths 200 for use in the method of FIGURE 9. The purpose of this 
routine is to quantify the data stream paths into objectified measures. The data 
stream paths are analyzed through analytic processing (blocks 201-203) and 
data mining (blocks 204-207). Other click stream path analysis techniques are 
feasible. 
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Analytic processing involves executing queries on the click stream 
paths stored in the click streams database 36 (shown in FIGURE 2). The 
queries are preferably written in a standardized query language, such as the 
Structured Query Language (SQL) and can be single or multiple statement 
queries. For example, a single statement query for determining a hierarchical 
listing of the effective sales entry points into a vendor Web site is: 
Select $, Entry-point 

From Click 

Where $ > 200 

Group by Entry -point 

Order by $ 

where Entry-point is the initial Web site in the data stream path and Click is 
the click streams database 36. This query statement returns those Web site 
entry points from the click streams database 36 whose sales exceeding 
$200.00. As a further example, a multiple statement query for determining 
combined book and video sales sold from the same product category: 

Select $, Prod_book, Id 
From Click 
Where $ > 2 00 
Insert into Tempi 
Select $, Prod_video, Id 
From Click 
Where $ > 200 
Insert into Temp2 
Select Book, Video, $, Id 
From Tempi, Temp2 
Where Id, Tempi and Temp2 
And where Book. Prod. Id = Video . Prod . Id 
where Click is the click streams database 36, Prod book and Prod_yideo are 
product types, Id is a product category, Tempi and Tempi store interim query 
results, and the Book.Prod.Id and Video. Prod.Id are specific instances of book 
and video products falling into the same product category. These query 
statements return those book and video sales categories in the click streams 
database 36 whose sales exceed $200.00. 

Data mining involves creating a matrix 90 (shown in FIGURE 6) of 
independent and dependent variables 91-97 and determining their relative 
weights. Data mining involves creating numerical correlations between 
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unique data entries in the click stream paths using advanced statistical analysis 
tools, such as the Oracle Darwin data mining tool, licensed by Oracle 
Corporation, Redwood Shores, California. 

Thus, if analytical processing is selected (block 201), the pre-defined 
query statements are retrieved and executed against the click streams database 
36 (block 202) and the results are analyzed (block 203). Similarly, if data 
mining is selected (block 204), an independent variable 97 and dependent 
variables 91-96 are selected (block 205). The relative weightings of the 
dependent variables 91-96 are generated (block 206) and the results analyzed 
(block 207). The routine then returns. 

FIGURE 13 is a flow diagram showing a routine for validating e- 
commerce business models 210 for use in the method of FIGURE 9. The 
purpose of this routine is to determine whether the e-commerce business 
models currently in use require updating and reevaluation. Behavioral shifts 
are quantitative measures that equate performance to behavior. For instance, 
an increase in purchasing as measured by dollar revenue might indicate 
effective presentation models 39 (shown in FIGURE 2). Similarly, longer 
sessions as measured by click stream path lengths with long series of 
unsuccessful searches might indicate poor presentation models 39. 

Weighting shifts are changes in the dependent variables 91-96 that 
exceed some pre-determined thresholds. Changes to the demographic models 
38 (shown in FIGURE 2) represent shifts in the customer segments that occur 
relatively slowly over time. 

Thus, behavioral shifts (block 211) indicate a need to evaluate the 
presentation models 39 (block 212) while shifts in the weightings of 
dependent variables 91-96 (block 213) indicate a need to evaluate the 
demographic models 38 (block 214). Upon the completion of the evaluations, 
the routine returns. 

Using the approach of the present invention, e-commerce business 
models, including demographic and presentation models, can be continually 
evaluated and, if necessary, regenerated in an on-the-fly basis using click 
stream analysis. The raw click stream paths through a Web site are classified, 
analyzed and correlated to quantitative measures. These measures can be used 
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to validate the e-commerce business models while data is still fresh and 
relevant. 

While the invention has been particularly shown and described as 
referenced to the embodiments thereof, those skilled in the art will understand 



5 that the foregoing and other changes in form and detail may be made therein 
without departing from the spirit and scope of the invention. 

CLAIMS 

1 1 . A system (10) for transacting electronic commerce via a 

2 hierarchically structured Web site with click stream feedback, comprising: 

3 a server (21) serving a plurality of individual Web pages (32) 

4 structured in a hierarchical manner as a Web site (60), each Web page (32) 

5 comprising one or more hyperlinks (50) selectable by a user to provide at least 

6 one of another Web page (32) and content; 

7 a click stream analyzer (22) collecting a click stream (36) for each user 

8 session on the Web site (60), the click stream (36) comprising data entries 

9 recording one or more of the hyperlinks (50) selected during the user session 

10 and processing each collected click stream (36), comprising: 

11 a classifier (1 12) classifying the collected click stream (36) into 

12 one or more pre-defined categories (90) describing characteristics shared by a 

13 plurality of the click streams (36); and 

14 an analyzer (1 13) analyzing the collected click stream (36), 

1 5 comprising at least one of: 

16 an analytic processor (116) executing one or more 

17 structured queries (50) on the collected click streams (36) based on at least one 

18 of the data entries and the shared characteristics; and 

1 9 a data mining tool (117) performing data mining on the 

20 collected click streams (36) by assigning an independent variable (97) and at 

21 least one dependent variable (91) and determining the relative weightings (90) 

22 thereon. 

1 2. A system according to Claim 1, further comprising: 

2 executable scripts (32) embedded within one or more of the Web pages 

3 (132); and 
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4 an interpreter (1 1 8) within the server (11) executing the executable 

5 scripts (32) as each Web page (32) is served. 

1 3. A system according to Claim 1, wherein the Web pages (32) 

2 are written in a tag-delimited page description language and the executable 

3 scripts (32) are written in an interpretable server-executable language. 

1 4. A system according to Claim 1, further comprising: 

2 a data reducer (111) reducing the size of the collected click streams 

3 (36), comprising at least one of: 

4 a compressor (111) compressing the collected click streams 

5 (36) by removing repetitious data entries (99); and 

6 an inference engine (111) generating inferences from the data 

7 entries (99) of each collected click stream (36). 

1 5. A system according to Claim 1, further comprising: 

2 one or more business models (37) maintained in a database (24) and 

3 describing at least one of a demographic model (38) and a presentation model 

4 (39), each such business model (37) being incorporated into the Web pages 

5 (32); and 

6 a validator (114) validating each such business model (37) by 

7 comparing the business model (37) to the processed click streams (36), 

8 comprising at least one of: 

9 a presentation model evaluator (114) identifying behavioral 

1 0 shifts from results of the structured queries (50); and 

11 a demographic model evaluator (1 14) determining shifts in the 

12 relative weightings of the dependent variables (91). 

1 6. A system according to Claim 5, further comprising: 

2 a modeler (115) generating a new business model (37) when at least 

3 one of the behavioral shifts and relative weighting shifts exceed a pre- 

4 determined quantitative threshold. 

1 7. A method for transacting electronic commerce via a 

2 hierarchically structured Web site with click stream feedback, comprising: 
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3 serving a plurality of individual Web pages (32) structured in a 

4 hierarchical manner as a Web site (60), each Web page (32) comprising one or 

5 more hyperlinks (50) selectable by a user to provide at least one of another 

6 Web page (32) and content; 

7 collecting a click stream (36) for each user session on the Web site 

8 (60), the click stream (36) comprising data entries recording one or more of 

9 the hyperlinks (50) selected during the user session; 

10 processing each collected click stream (36), comprising: 

1 1 classifying the collected click stream (36) into one or more pre- 

12 defined categories (96) describing characteristics shared by a plurality of the 

13 click streams (36); and 

14 analyzing the collected click stream (36), comprising at least 

1 5 one of: 

1 6 executing one or more structured queries (56) on the 

17 collected click streams (36) based on at least one of the data entries and the 

1 8 shared characteristics; and 

1 9 performing data mining on the collected click streams 

20 (36) by assigning an independent variable (97) and at least one dependent 

21 variable (91) and determining the relative weightings (90) thereon. 

1 8. A method according to Claim 7, further comprising: 

2 embedding executable scripts (32) within one or more of the Web 

3 pages (32); and 

4 executing the executable scripts (32) as each Web page (32) is served. 

1 9. A method according to Claim 7, wherein the Web pages (32) 

2 . are written in a tag-delimited page description language and the executable 

3 scripts (32) are written in an interpretable server-executable language. 

1 10. A method according to Claim 7, further comprising: 

2 reducing the size of the collected click streams (36), comprising at 

3 least one of: 

4 compressing the collected click streams (36) by removing 

5 repetitious data entries (99); and 
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6 generating inferences from the data entries of each collected 

7 click stream (36). 

1 1 1 . A method according to Claim 7, further comprising: 

2 maintaining one or more business models (37) describing at least one 

3 of a demographic model and a presentation model (38), each such business 

4 model (37) being incorporated into the Web pages (32); and 

5 validating each such business model (37) by comparing the business 

6 model (37) to the processed click streams (36), comprising at least one of: 

7 identifying behavioral shifts from results of the structured 

8 queries (50); and 

9 determining shifts in the relative weightings of the dependent 
10 variables (91). 

1 12. A method according to Claim 1 1, further comprising: 

2 generating a new business model (37) when at least one of the 

3 behavioral shifts and relative weighting shifts exceed a pre-determined 

4 quantitative threshold. 

1 13. A computer-readable storage medium holding code for 

2 performing the method according to Claims 7, 8, 9, 10, 11, or 12. 

1 14. A system for dynamically evaluating an electronic commerce 

2 business model (37) through click stream analysis, comprising: 

3 a Web site (60) comprising a plurality of related Web pages (32) 

4 structured in a hierarchical manner (60) and incorporating an electronic 

5 commerce business model (37), each Web page (32) comprising one or more 

6 hyperlinks (50) selectable by a user; 

7 a database (24) storing a plurality of data vectors (80) that each 

8 represent a click stream path (50) through the Web site (32), each data vector 

9 (80) comprising a set of data entries (81) which each corresponds to Web 

10 content (33) selected via the hyperlinks in the related Web pages (32); and 

11 a click stream (22) analyzer, comprising: 

12 a classification module (112) classifying each click stream path 

13 (50) based on at least one such data entry (81) in the data vector (80), the 
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14 classified click stream (50) path sharing at least one common characteristic 

1 5 with one or more other click stream paths (50); 

16 an analyzer module (113) analyzing the classified click stream 

1 7 paths (50) according to a pre-defined evaluation procedure directed at 

1 8 determining at least one of efficacy of presentation (39) and shifts in 

1 9 demography (38); and 

20 a comparison module (1 14) comparing the electronic 

21 commerce business model (37) to the classified click stream paths analysis. 

1 15. A system according to Claim 13, the click stream (36) analyzer 

2 further comprising: 

3 a data reduction module (111) reducing the size of the data vectors (80) 

4 prior to classification. 

1 16. A system according to Claim 14, further comprising: 

2 the data reduction module (1 1 1) compressing the data vectors (80) to 

3 reduce the data vector sizes by removing each data entry (81) corresponding to 

4 a selected Web page (32) which is repeated within the data vector (80). 

1 17. A system according to Claim 14, further comprising: 

2 the data reduction module (111) generating inferences from at least one 

3 data entry (81) in the data vector (80) to reduce the data vector sizes. 

1 1 8. A system according to Claim 13, wherein the at least one 

2 common characteristic is selected from the set of characteristics comprising 

3 completed orders, path length, Boolean operations, cross-sales, aborted orders, 

4 origin Web site, exit Web site, product, styles, cause and effect, and search 

5 category. 

1 1 9 . A system according to Claim 1 3 , wherein the pre-defined 

2 evaluation procedure further comprises: 

3 an analytic processing module (116) analytically processing the click 

4 stream paths (50) by performing a series of one or more queries on the data 

5 vectors (80). 
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1 20. A system according to Claim 13, wherein the pre-defined 

2 evaluation procedure further comprises: 

3 a data mining module (117) performing data mining on the data 

4 vectors (80). 

1 2 1 . A system according to Claim 1 9, further comprising: 

2 the data mining module (117) assigning at least one independent 

3 variable (97) based on one such data entry in the data vectors, assigning one or 

4 more dependent variables (91) based on the remaining data entries in the data 

5 vectors (80), and determining relative weightings for the dependent variables 

6 (91) through an analysis of the at least one independent variable (97). 

1 22. A system according to Claim 13, wherein the electronic 

2 commerce business model (37) further comprises a performance model, the 

3 system further comprising: 

4 a validation module (1 14) evaluating the performance model against 

5 the classified click stream paths analysis. 

1 23. A system according to Claim 13, wherein the electronic 

2 commerce business model (37) further comprises a demographic model, the 

3 system further comprising: 

4 a validation module (1 14) evaluating the demographic model against 

5 the classified click stream paths analysis. 

1 24. A system according to Claim 13, further comprising: 

2 the click stream analyzer (22) setting a validation threshold for the 

3 electronic commerce business model (37) being validated; and 

4 a modeler module (115) generating a new electronic commerce 

5 business model (37) when the validation threshold is exceeded in the 

6 comparison to the classified click stream paths analysis. 

1 25. A method for dynamically evaluating an electronic commerce 

2 business model (37) through click stream analysis, comprising: 

3 incorporating an electronic commerce business model (37)into a Web 

4 site (60) comprising a plurality of related Web pages (32) structured in a 
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5 hierarchical manner (60), each Web page (32) comprising one or more 

6 hyperlinks (50) selectable by a user; 

7 storing a plurality of data vectors (80) that each represent a click 

8 stream path (50) through the Web site (32), each data vector (80) comprising a 

9 set of data entries (81) which each corresponds to Web content (33) selected 

10 via the hyperlinks in the related Web pages (32); 

1 1 classifying each click stream path (50) based on at least one such data 

12 entry (81) in the data vector (80), the classified click stream path (50) sharing 

13 at least one common characteristic with one or more other click stream paths 

14 (50); 

1 5 analyzing the classified click stream paths (50) according to a pre- 

1 6 defined evaluation procedure directed at determining at least one of efficacy of 

17 presentation (39) and shifts in demography (38); and 

18 comparing the electronic commerce business model (37) to the 

1 9 classified click stream paths analysis. 

1 26. A method according to Claim 24, further comprising: 

2 reducing the size of the data vectors (80) prior to classification. 

1 27. A method according to Claim 25, further comprising: 

2 compressing the data vectors to reduce the data vector (80) sizes by 

3 removing each data entry (81) corresponding to a selected Web page (32) 

4 which is repeated within the data vector (80). 

1 28. A method according to Claim 25, further comprising: 

2 generating inferences from at least one data entry (81) in the data 

3 vector (80) to reduce the data vector sizes. 

1 29. A method according to Claim 24, wherein the at least one 

2 common characteristic is selected from the set of characteristics comprising 

3 completed orders, path length, Boolean operations, cross-sales, aborted orders, 

4 origin Web site, exit Web site, product, styles, cause and effect, and search 

5 category. 

1 30. A method according to Claim 24, wherein the pre-defined 

2 evaluation procedure further comprises: 
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3 analytically processing the click stream paths (50) by performing a 

4 series of one or more queries on the data vectors (80). 

1 3 1 . A method according to Claim 24, wherein the pre-defined 

2 evaluation procedure further comprises: 

3 performing data mining on the data vectors (80). 

1 32. A method according to Claim 30, further comprising: 

2 assigning at least one independent variable (97) based on one such data 

3 entry in the data vectors; 

4 assigning one or more dependent variables (91) based on the remaining 

5 data entries in the data vectors (80); and 

6 determining relative weightings for the dependent variables (91) 

7 through an analysis of the at least one independent variable (97). 

1 33 . A method according to Claim 24, wherein the electronic 

2 commerce business model (37) further comprises a performance model, the 

3 method further comprising: 

4 evaluating the performance model against the classified click stream 

5 paths analysis. 

1 34. A method according to Claim 24, wherein the electronic 

2 commerce business model (37) further comprises a demographic model, the 

3 method further comprising: 

4 evaluating the demographic model against the classified click stream 

5 paths analysis. 

1 35. A method according to Claim 24, further comprising: 

2 setting a validation threshold for the electronic commerce business 

3 model (37) being validated; and 

4 generating a new electronic commerce business model (37) when the 

5 validation threshold is exceeded in the comparison to the classified click 

6 stream (36) paths analysis. 
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1 36. A computer-readable storage medium holding code for 

2 performing the method according to Claims 24, 25, 26, 27, 28, 29, 30, 31, 32, 

3 33 or 34. 
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Figure 2. 
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Figure 7. 
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